Use time fields in sdselect searches
This topic is directed towards users of Federated Search for Amazon S3 who run sdselect
searches over federated indexes that map to customer-created AWS Glue tables.
If you use Amazon Security Lake federated indexes or Amazon S3 federated indexes that map to Splunk-managed AWS Glue tables, you can skip this topic. You do not declare time fields in advance when you define those kinds of federated indexes, and you will use _time
as the time field in sdselect
searches that reference those federated indexes.
When you run an sdselect
search that involves time fields over an Amazon S3 federated index that is mapped to a customer-managed AWS Glue table dataset, you might encounter difficulties if you use the original time field from the AWS Glue table in your search. Instead, for best results, set the Unix time field for the federated index that maps to that dataset to _time and then use _time
as your time field throughout your search.
When you must use the Unix time field for your sdselect searches
When you define an Amazon S3 federated index that maps to a customer-managed AWS Glue table, you define the Time field, Time format, and Unix time field for the dataset the index is mapped to. The Time field is the actual time field in your AWS Glue table data, and the Time format is the format of that time field.
The Unix time field is a field alias of the Time field that accurately converts the Time field values into numeric UNIX time format values, even if the Time field has a string UNIX time format, a custom string format, or a SQL timestamp
data type.
The Unix time field defaults to _time
. The _time
field is useful because Splunk software treats _time
as a field with both numeric and string properties.
- Because
_time
values are in numeric UNIX time format, you can use_time
in functions and operations that require numeric values. - However, when it makes sense to do so, Splunk Web translates
_time
values into human-readable timestamp strings that display with the time zone of the Splunk platform deployment running the search.
The following table shows you how different kinds of Time field fields match up against Unix time field fields when you apply time fields to sdselect
use cases such as numeric stats aggregation functions, WHERE clauses, and GROUPBY clauses with span
arguments.
Follow only the Unix time field row in this table if your sdselect
searches use Amazon Security Lake federated indexes or Amazon S3 indexes that map to Splunk-managed AWS Glue tables.
You will always use _time
as the time field in sdselect
searches that use those kinds of federated indexes.
Type of time field | Time field format | Can I apply the time field to a WHERE clause? | Can I apply the time field to numeric stats aggregation functions such as avg() or sum()? | Can I apply the time field to a GROUPBY clause with a span argument? |
---|---|---|---|---|
Time field | %UT (Numeric UNIX time format) | Yes | Yes | No |
Time field | Custom string format, defined with a series of time format variables. Can also be %s for string UNIX time format. | Yes, but the search might return incorrect results due to lexicographical1 comparison. | Yes | No |
Time field | %ST (SQL timestamp datatype)
|
No | No | No |
Unix time field | Numeric UNIX time format | Yes | Yes | Yes |
1 Lexicographical order sorts items based on the values used to encode the items in computer memory. In Splunk software, this is almost always UTF-8 encoding, which is a superset of ASCII. Lexicographical order sorts numbers before letters, based on the first digit. For example, the numbers 10, 9, 70, 100 are sorted lexicographically as 10, 100, 70, 9.
If you are defining an Amazon S3 federated index for a customer-managed Glue table, and you prefer to see your human-readable timestamps replaced by numeric UNIX time format values, change the value of Unix time field to a different field name than _time. The Unix time field value cannot be shared by field names in the AWS Glue table dataset to which the federated index maps.
To learn more about declaring the Time field, Time format, and Unix time field for Amazon S3 federated indexes that map to customer-created AWS Glue datasets, see Map an Amazon S3 federated index to a customer-created AWS Glue table dataset.
For more information about Amazon S3 federated indexes that map to Splunk-managed AWS Glue table datasets, see Map an Amazon S3 federated index to a Splunk-managed AWS Glue table dataset.
For more information about Amazon Security Lake federated indexes, see Map Amazon Security Lake federated indexes to AWS Glue tables.
For more information about applying time fields to sdselect
WHERE clauses, see Apply date and time eval functions to fields in the WHERE clause.
About using the time range picker in sdselect searches that use declared time fields
If you set up time range filters in an sdselect
search that uses the declared Time field, Unix time field, or Time partition field from the federated index definition and then select a time range using the time range picker, the sdselect
command ignores the time range picker selection when you run the search.
If your time range filters use time fields that are not declared in the federated index definition, the sdselect
command uses the time range in conjunction with the time range picker selection.
When the Time field has values with the SQL timestamp data type
This section applies only to users of Federated Search for Amazon S3.
Amazon S3 datasets might include time fields with SQL timestamp
data type values. The sdselect
command does not natively support the timestamp
data type. The sdselect
command converts timestamp
values into a string with a yyyy-mm-dd hh:mm:ss[.nnnnnnnnn]
format, which means that timestamp
values do not work for aspects of sdselect
searches that require numeric values, such as stats aggregation functions like avg()
and sum()
.
The sdselect
command displays timestamp
values in the Coordinated Universal Time (UTC) time zone.
If the Time field for your federated index has a SQL timestamp
data type, your best option is to set the Unix time field for that federated index to _time. The Unix time field is an alias for the Time field. The Unix time field converts the SQL timestamp
data type values of the Time field into numeric UNIX time format. This conversion is perfect for functions that require numeric time field values.
Splunk Web displays _time
values in human-readable string format where it is appropriate to do so, in the local time zone for your Splunk platform deployment.
This example involves a federated index named Time_Field_Timestamp_Data_Type. The time field values for this federated index are as follows:
Time field | Time format | Unix time field |
---|---|---|
SQLTimestampTime | %ST | _time |
The Time format variable of %ST indicates that SQLTimestampTime, the declared Time field for this federated index, has values of the SQL timestamp
data type.
Here is a search that summarizes the information contained in the AWS Glue table dataset to which the federated index maps.
| sdselect scan_count, user, SQLTimestampTime, _time FROM Time_Field_Timestamp_Data_Type
Here are the results of that search. Notice that the timestamp
values for SQLTimestampTime
, the Time field from the AWS Glue table, have the UTC time zone, while the timestamps for _time
, the Unix time field, are the same values in Pacific Standard Time (PST), the time zone local to the Splunk platform deployment the search ran upon.
scan_count | user | SQLTimestampTime | _time |
---|---|---|---|
95 | Jakob | 2023-03-08 23:48:27.000 | 2023-03-08 15:48:27 |
70 | Annabelle | 2023-03-08 23:49:37.000 | 2023-03-08 15:49:37 |
88 | Jakob | 2023-03-09 00:49:37.000 | 2023-03-09 16:49:37 |
69 | Aki | 2023-03-09 00:50:12.000 | 2023-03-09 16:50:12 |
39 | Sameera | 2023-03-09 00:51:12.000 | 2023-03-09 16:51:12 |
70 | Annabelle | 2023-03-10 00:51:21.000 | 2023-03-10 16:51:21 |
99 | Larry | 2023-03-12 09:29:32.000 | 2023-03-12 01:29:32 |
99 | Ramon | 2023-03-13 04:52:16.000 | 2023-03-13 21:52:16 |
39 | Sameera | 2023-03-13 04:53:16.000 | 2023-03-13 21:53:16 |
Now, here is an sdselect
search that is run against that dataset. Notice that this search uses the _time
field in several spots, and that the WHERE clause compares _time
against a time value in numeric UNIX time format. This UNIX time value is equivalent to March 9, 2023 4:51:21 PM (PT), which means that the search filters events for Larry and Ramon out of its results.
| sdselect count, values(user), values(scan_count), avg(_time), min(SQLTimestampTime), max(SQLTimestampTime) FROM Time_Field_Timestamp_Data_Type WHERE _time < 1678409481 GROUPBY _time span=4h
The search returns the following results.
_time | count | values(user) | values(scan_count) | avg(_time) | min(SQLTimestampTime) | max(SQLTimestampTime) |
---|---|---|---|---|---|---|
2023-03-08 12:00 | 2 | Annabelle Jakob |
70 95 |
1678319342.000000 | 2023-03-08 23:48:27.000 | 2023-03-08 23:49:37.000 |
2023-03-08 16:00 | 3 | Sameera Aki Jakob |
39 69 88 |
16783233020.333333 | 2023-03-09 00:49:37.000 | 2023-03-09 00:51:12.000 |
If the search uses SQLTimestampTime
for the avg()
function, the WHERE clause, or for the GROUPBY clause with the span
argument, the search fails and returns error messages. This happens for the following reasons:
SQLTimestampTime
has time field values with the SQLtimestamp
data type, whichsdselect
cannot apply to stats aggregation functions that require numeric values, such asavg()
andsum()
.sdselect
cannot apply time field values with the SQLtimestamp
data type to WHERE clauses.sdselect
requires that you apply the Unix time field to GROUPBY clauses with aspan
argument.
When you use _time
, the Unix time field alias of SQLTimestampTime
, the search works as expected.
See also
sdselect command WHERE clause operations | sdselect command examples for Amazon S3 |
This documentation applies to the following versions of Splunk Cloud Platform™: 9.3.2408
Feedback submitted, thanks!